Perceptual Evaluation of Cost for Segment Selection in Concatenative Speech Synthesis

نویسندگان

  • Tomoki Toda
  • Hisashi Kawai
  • Minoru Tsuzaki
  • Kiyohiro Shikano
چکیده

ABSTRACT In segment selection for concatenative Text-to-Speech (TTS), it is important to utilize a cost that corresponds to the perceptual characteristics. We clarify correspondence to the perceptual scores of the cost, and then various functions to integrate the costs are evaluated. The perceptual scores are determined from results of perceptual experiments on the naturalness of synthetic speech. The results show that the average cost, which shows the naturalness degradation over the entire synthetic speech has better correspondence to the perceptual scores than the maximum cost, which shows the local naturalness degradation. Furthermore, RMS (Root Mean Square) cost, which is affected by both the average cost and the maximum cost, has the best correspondence.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An evaluation of cost functions sensitively capturing local degradation of naturalness for segment selection in concatenative speech synthesis

In this paper, we evaluate various cost functions for selecting a segment sequence in terms of the correspondence between the cost and perceptual scores to the naturalness of synthetic speech. The results demonstrate that the conventional average cost, which shows the degradation of naturalness over the entire synthetic utterance, has better correspondence to the perceptual scores than the maxi...

متن کامل

Optimizing integrated cost function for segment selection in concatenative speech synthesis based on perceptual evaluations

This paper describes optimizing a cost function for segment selection in concatenative Text-to-Speech based on perceptual characteristics. We use the norm of a local cost for each segment as an integrated cost function for a segment sequence to consider both the degradation of naturalness over the entire synthetic speech and the local degradation. The cost function is optimized by adjusting not...

متن کامل

Applying scalable phonetic context similarity in unit selection of concatenative text-to-speech

This paper presents an approach using phonetic context similarity as a cost function in unit selection of concatenative Textto-Speech. The approach measures the degree of similarity between the desired context and the candidate segment under different phonetic contexts. It considers the impact from relatively far contexts when plenty of candidates are available and can take advantage of the dat...

متن کامل

Automatic feature selection for acoustic-visual concatenative speech synthesis: towards a perceptual objective measure

We present an iterative algorithm for automatic feature selection and weight tuning of target cost in the context of unit selection based audio-visual speech synthesis. We perform feature selection and weight tuning for a given unit-selection corpus to make the ranking given by the target cost function consistent with the ordering given by an objective dissimilarity measure. We explicitly perfo...

متن کامل

Segment selection considering local degradation of naturalness in concatenative speech synthesis

In this paper, we investigate the effect of using a novel cost, RMS (Root Mean Square) cost, for segment selection for concatenative Text-to-Speech. The RMS cost is affected not only by the total degradation of naturalness but also by the local degradation of naturalness. From the results of experiments comparing this approach with segment selection based on a conventional average cost, it is f...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2002